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Abstract. The final step of most large-scale structure analyses involves the comparison of 
power spectra or correlation functions to theoretical models. It is clear that the theoretical 
models have parameter dependence, but frequently the measurements and the covariance 
matrix depend upon some of the parameters as well. We show that a very simple interpolation 
scheme from an unstructured mesh allows for an efficient way to include this parameter 
dependence self-consistently in the analysis at modest computational expense. We describe 
two schemes for covariance matrices. The scheme which uses the geometric structure of such 
matrices performs roughly twice as well as the simplest scheme, though both perform very 
well. 

Keywords: cosmological parameters from LSS - power spectrum - baryon acoustic oscilla¬ 
tions - galaxy clustering 


Contents 


1 Introduction 1 

2 Interpolation scheme 3 

3 Interpolating the data vector 5 

4 Interpolating covariance 6 

4.1 Background 6 

4.2 Diagonal example 7 

4.3 Matrix example 7 

4.4 More dimensions and simplices 11 

5 Conclusions 11 

A Neglecting the parameter dependence of the covariance 12 

B Distance measures for SPD matrices 13 

C Solving the geometric interpolation equation 14 


1 Introduction 

The study of large-scale structure - using the cosmic microwave background, galaxy and 
cluster surveys, weak lensing, 21cm-background fluctuations and other probes - promises to 
teach us a wealth of information about our Universe and theories of fundamental physics 
and for this reason has been the focus of intense community effort for many decades. The 
final stage in any cosmological data analysis, from which the key constraints and insights are 
derived, is the comparison of a summary statistic (or set of statistics) with the predictions of 
a theoretical model, and it is this stage which is the focus of this paper. Within cosmology 
the most common approach is to construct a likelihood function and derive limits on a set 
of cosmological parameters (or a class of theories) given the data. Frequently this is done 
with a Markov Chain Monte Carlo algorithm [1, 2] or nested sampling scheme [3]. What is 
desired then is the ability to compute the likelihood function of the data, given the model, as 
a function of the theoretical parameters. 

Within the large-scale structure community the most common assumption about the 
likelihood function is that it is Gaussian. This is inspired by the central limit theorem 
under the assumption that there are many modes contributing to each measurement. The 
approximation becomes more reliable as the data become better constraining, though its 
reliability can depend on the summary statistic being used to compare data with theory. 1 
For this paper, we shall assume that the Gaussian likelihood approximation is sufficient; we 
refer the reader to the existing literature on the scope and validity of this approximation for 
different probes. 

1 For example, it is likely to be a worse approximation for the power spectrum than for the correlation 
function, since the power spectrum in a bin is the sum of positive definite quantities. 


- 1 - 





We denote the parameters upon which our theory depends with p and write the data 
in vector form, di and expected value and covariance of the observations as m(p) and Cij(p). 
Within the Gaussian approximation, the likelihood is 


C{p) oc \C(p)\ 


- 1/2 


exp 


— 2 ( d Yp) -m(p)Y c (p ) 


-i 

u 


(dj(p) - Hj{p)) 


( 1 . 1 ) 


We emphasize that dependence on p exists in three places - (i) the prediction (p). (ii) the 
data vector di(p) and (iii) the covariance matrix Cij{p). Any self-consistent evaluation of the 
likelihood function needs to address all three dependences. The first of these is what might 
be described as the “modeling” or “prediction” phase and can be addressed with approaches 
from direct analytic calculation to detailed numerical simulations. We will assume that this 
dependence is addressed at the level of precision required for current and future surveys. Our 
purpose in this paper is to argue that the latter two dependences, often ignored in analyses, 
are also readily addressable. 

Motivated by studies of weak gravitational lensing, baryon acoustic oscillations (BAO) 
and redshift space distortions (RSD), we are interested in measuring the 2-point function of 
shear, galaxies, quasars or Lyman a absorption from a large (redshift) survey. To simplify 
the exposition we shall specialize to the case of a galaxy 2-point function, though most of our 
statements are intended to hold more generally. The summary statistic is thus the galaxy 
power spectrum or correlation function, or perhaps an integral of these (e.g. [4]). We have 
indicated a dependence of di on p since, given the observed positions of galaxies and the 
selection function, computation of the correlation function or power spectrum requires the 
assumption of a fiducial cosmology and this should ideally be consistent with the cosmological 
parameters being tested. Further model dependence is introduced if ‘weights’ are used in the 
calculation or if density field reconstruction [5] is used to sharpen the acoustic features in the 
clustering signal. 

In what follows we shall assume that /i t (p) is in hand. This is the ‘modeling’ phase of 
the problem, and is the subject of a large literature in itself. Recent comparisons of models 
focused on BAO and RSD can be found in [6-10]. The computation of Cij(p) is more complex. 
While the expression for Co v[P(k'), P(k”)] or Cov[£(r*), ^(i^ 7 )] within linear perturbation the¬ 
ory (assuming Gaussian statistics for the density fluctuations) is straightforward, deviations 
from linear theory can be important on the scales of interest for current and future surveys 
and properly handling the mask associated with a complex observing geometry is non-trivial. 
The situation becomes even more complex if we use a non-linear filter such as density field 
reconstruction (to sharpen the BAO peak) before computing the 2-point function. These 
complexities have led to analysts generating their covariance matrices through Monte Carlo 
simulation of a large number of mock catalogs. While ideally the covariance matrix is gen¬ 
erated for each and every parameter set of interest, this is computationally expensive. Even 
the more limited procedure of iteratively regenerating the covariance matrix for the best fit 
cosmology is not usually attempted. As we prepare for the next generation of surveys, it be¬ 
hooves us to rethink these steps. In what follows we shall advocate and investigate a low order 
interpolation scheme that provides a simple, approximate, route to including the parameter 
variations in di and Cij with moderate computational cost. 

Essentially our proposal is to generate an interpolator (or emulator, or response model) 
for the data and covariance matrix based on values pre-computed at certain points in the 
parameter space. This interpolator then allows the rapid computation of the likelihood for 
any cosmology, self-consistently including changes to the fiducial cosmology, bias and growth 
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factor. We implicitly assume that for current and future surveys the range of parameters being 
searched are sufficiently small that a low order interpolation is sufficient, and we investigate 
here a simple linear interpolation from an unstructured mesh. This is not the only choice, but 
it does allow for easy updating and refining of the mesh as the data improve and the attention 
is focussed on smaller regions of parameter space. Recently [11] presented an alternative 
procedure, aimed at the weak lensing case. This builds upon work presented in [12] which 
developed an interpolation for cosmological power spectra derived from N-body simulations. 
Some of the techniques introduced or refined in those papers could be of use in our context, 
but we have found that the simple situation described below performs quite well in this 
context so the more sophisticated approach may not be needed. 

The outline of the paper is as follows. In the next section we describe the basics of 
our interpolation scheme. We describe the simple case of interpolation of the data vector, 
e.g. £e(ri) or P(ki), in section 3. Section 4 describes how one can interpolate the covariance 
matrix, or its inverse the precision matrix. This situation is more complex, so we divide this 
section into several subsections and relegate some technical details to appendices. We finish 
with a summary of our main results and directions for future work in section 5. 

2 Interpolation scheme 

Below we shall discuss how to compute the data and covariance at particular points in the 
parameter space, but first we introduce some notation which will be used in each case. To 
begin we distinguish between ‘fast parameters’, which may enter the data, model or covariance 
matrix but do not require any difficult recomputations and ‘slow’ parameters which need 
significant calculation. We need only interpolate in the ‘slow’ parameters, and henceforth we 
shall restrict ourselves to this set. We shall assume there are n p such parameters. We shall 
also assume that we are interpolating a single object (e.g. the total covariance matrix rather 
than the sample variance and shot-noise pieces of it) though the generalization to multiple 
objects is trivial and splitting the data or covariance matrix may be beneficial. 

There are numerous methods for performing interpolation (schemes which are particu¬ 
larly relevant to our situation are kernel interpolation and tensor B-splines) and we do not 
need to use the same method for each component of the problem. However doing so simplifies 
the discussion. Perhaps the simplest technique is multilinear interpolation (from an unstruc¬ 
tured grid) and we shall use this method below as an illustrative example. The reasons are 
twofold. First, we expect changes to our data and covariance to be small and smooth, so 
a low-dimensional method should be adequate and very easy to code. Second, by using an 
unstructured mesh it is easy to later add additional interpolation points if they become avail¬ 
able. Adding additional points close to the regions of high likelihood can be more valuable 
than more complex interpolation schemes. Of course, many of the points we discuss below 
can be adapted to more complex interpolation schemes if necessary. 

To interpolate from an unstructured mesh we need our functions evaluated for at least 
n p + 1 points. To begin let us imagine that we have the data, model or covariance evaluated 
at n p + 1 points in the parameter space, chosen such that the simplex they define encloses 
the entire high-likelihood region. 2 We can trivially interpolate to any interior point using 
(homogeneous) Mobius barycentric coordinates, also known as areal coordinates [e.g. 13]. 
Given a point, p, interior to the simplex defined by {pi}, solve for the areal coordinates Xi 
which satisfy p = XiPi and x % = 1 (this is a simple linear transformation). If the point 

2 We note that this is very unlikely to be the best distribution, but serves to introduce the idea. 
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Figure 1. How the correlation function changes with distance scale. In the upper panel the lines 
show the average redshift-space monopole and quadrupole moments of the correlation function of 
halos from two N-body simulations assuming a distance redshift relation appropriate to ACDM with 
= 0.25 (red), 0.275 (light grey) and 0.30 (blue). The distance to z = 1 varies by 3 per cent for these 
cosmologies. The halos are taken from the z = 1 outputs of two simulations with the same cosmology 
but different initial conditions and observed from each of the 8 corners as if along a lightcone. The 
halos have masses 5 x 10 12 < M < 10 13 and form 16 realizations of an octant of the sky cut 

to 0.8 < 2 < 1.2. Note that the correlation function assuming Q rn = 0.275 can be described to better 
than 10 per cent by the linear interpolation of the 0.25 and 0.30 results except where it crosses zero. 
The lower panel shows the ratio of the linear interpolation to the 0.275 result. The light and dark 
grey bands show the la error on the mean (from the scatter of the 16 samples) of the fl m = 0.275 
value for the quadrupole and monopole respectively. The interpolation error is below this for all scales 
even for such a large interpolation. 


is interior to the simplex then 0 < Xi < 1 for all i, and this can be used as a test for an 
interior point. Then the interpolated function, F. is simply )Tb Xjiq, where the Fj are the 
data, model or covariance evaluated at parameters p t . This technique is frequently referred 
to as linear Lagrange interpolation. 

A better scheme would add at least one additional point, close to a prior guess of the 
maximum likelihood parameters. Adding further points in this region is desirable, with the 
number limited primarily by computational resources. There are three criteria which could 
determine where to place additional points: (1) place points where the likelihood is high, 
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or (2) in regions where the linear interpolation performs the worst, or (3) near the edges of 
the parameter space (if the simplices so defined prove advantageous). The best distribution 
of points is likely to be dependent upon the specifics of the problem. In the presence of 
these additional points the interpolation begins by first finding the n p + 1 points closest 
to the point, p, of interest whose simplex 3 * * * encloses p. Using these n p + 1 points, find the 
areal coordinates, Xi and compute Xji 7 ) as before. We note in passing that parameter 
redefinitions may improve the performance of any interpolation scheme and that if we have a 
metric on the parameter space we can define ‘near’ and ‘far’ in terms of an approximation to 
a prior likelihood, but we shall not pursue this route here. We also note that while we have 
pursued interpolation from an unstructured mesh, many of the techniques we describe below 
can be useful in other interpolation schemes. 

3 Interpolating the data vector 

Typically one treats the data vector as fixed when doing cosmological inference, but in princi¬ 
ple it can depend upon the cosmological model if that model includes changes in the distance- 
redshift relation or if complex processes such as density field reconstruction are applied to the 
data prior to computing the power spectrum or correlation function. For the range of cos¬ 
mologies usually considered the changes in the distance-redshift relation are relatively small, 
and smooth. A similar smoothness applies for density field reconstruction. Thus we expect 
any changes in our 2-point function to be smooth functions of the input parameters and hope 
that they can be linearly interpolated with little error. 

To illustrate this point we take the positions halos from two large N-body simulations, 
viewed from each corner of the boxes as if they filled an octant on the sky. Sixteen samples 
of halos were constructed, each in a shell 0.8 < z < 1.2. Holding the angular and redshift 
coordinates appropriate to the ‘true’ cosmology (as assumed in running the simulation) we 
convert to distances in three other cosmologies with different distance-redshift relations and 
then compute the halo auto-correlation function. For simplicity we take the distance-redshift 
relation to be that of ACDM cosmologies with f l m = 0.25, 0.275 and 0.3 though this is just 
for illustration and when analyzing data it may make more sense to use a scheme such as 
outlined in ref. [15]. The results are shown in figure 2 where we see that the changes are 
smooth functions of the input parameters and linear interpolation from the endpoints results 
in errors in the interior smaller than the observational error. The variation in currently allowed 
cosmologies is smaller than this, and thus the interpolation should perform even better when 
used in cosmological inference, especially if one of the places where the data are evaluated is 
very close to the best-fitting cosmology. For current and future surveys there is little cost to 
computing the correlation function or power spectrum of the data for a number of different 
fiducial cosmologies, so the interpolation can be made almost arbitrarily precise. For this 
reason we shall not investigate this in detail, and turn instead to the more complex step of 
interpolating the covariance matrix. This is frequently the more computationally demanding 
calculation and is also the least straightforward conceptually. 

3 This is often called the ‘enclosing Delaunay simplex’, and it can be shown that the Delaunay triangulation 

gives rise to the piecewise linear approximation with the smallest maximum error at each point over functions 
with bounded curvature [14]. Since finding the bounding simplex is such a common problem (e.g. in collision 

detection and robot motion planning or computer graphics) there is a large literature on the subject and 

many well developed packages that can be used. We have used the routines in scipy. interpolate and 

scipy. spatial. 
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4 Interpolating covariance 


In large-scale structure analyses it is often the case that the covariance matrix is fixed through¬ 
out the analysis, usually at a fiducial cosmology where it was determined either through 
analytic means or, most often, Monte Carlo simulation. Computation of the covariance ma¬ 
trix from Monte Carlo simulation is often the most computationally demanding step in the 
analysis. Using the wrong covariance matrix won’t lead to a bias on average, but it may 
alter the confidence levels (see e.g. ref. [16] for a recent study and see [11] for a discussion 
of computing parameter dependent covariance matrices in the weak lensing case). In the 
limit that the fluctuations are Gaussian, the dynamics given by linear perturbation theory, 
the tracer bias is scale independent and the 2-point functions are very well measured, the 
covariance matrix will be essentially fixed and all models which are close to the data will have 
very similar covariance matrices. Also, for sufficiently constraining data the assumption of a 
fixed covariance matrix is relatively good (see discussion in Appendix A). Violations of any 
of these assumptions (e.g. not well constrained correlation functions on large scales or at high 
redshift, non-linear contributions to the dynamics, violations of the scale-independent bias 
assumption or inclusion of the trispectrum on small scales) can lead to relevent parameter 
dependence of C in the high likelihood regions. In this regard it is worth noting that it is 
reasonably straightforward to include the parameter dependence of C if C(p) or its inverse 
can be evaluated (possibly with regularization, possible simultaneously [17]) at a few loca¬ 
tions and the parameter dependence is smooth or small. If the likelihood is insensitive to 
changes in C with p it is even less sensitive to small interpolation errors when including that 
dependence. The decision on whether to invest the resources to allow interpolation of the 
covariance matrix will require a case-by-case analysis. 


4.1 Background 

Both the covariance matrix, C, and its inverse, the precision matrix, are symmetric, positive- 
definite (SPD) matrices. Given their importance in so many branches of engineering and the 
sciences it is no surprise that there is a vast literature on the properties of SPD matrices. 
The subset of SPD matrices has a rich mathematical structure, being a manifold with tan¬ 
gent vectors, geodesics and a metric (e.g. [18]). In particular, there are several methods for 
performing interpolation between such matrices at discretely sampled points in parameter 
space. All such interpolations implicitly depend upon a measure of distance between matri¬ 
ces, and there are several such distance measures. We review some of these measures, and 
the associated interpolations and averages, in Appendix B, and here focus on the results. 

Consider linearly interpolating between an SPD matrix Mq at t = 0 and M\ at t = 1. The 
arithmetic interpolation is trivially Alt = (1 — t)M ,o + tM\ and this is also SPD for 0 < t < 1 
since the SPD matrices form a convex cone. The geometric interpolation is almost as simple, 
Mt = Mq{Mq 1 Mi) 1 or the symmetric form Mt = Mq^(M q 1//2 MiM 0 1 ^ 2 )*Mq^ 2 , where the 
matrix power can be easily done in the diagonal basis. 4 In fact precisely this scheme is 
commonly used in computer graphics to interpolate perspective changes and rotations (using 
quaternions in place of rotation matrices for simplicity). Unlike arithmetic interpolation, 


4 Any SPD matrix can be written M = UDU T with D diagonal and Du > 0, so for any analytic function 
/ we have f(M) = Uf (D)U T . We recognize D as the matrix formed from the eigenvalues and U the matrix 
formed from the eigenvectors of M. 
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geometric interpolation gives an SPD matrix for all t and so can also be used for extrapolation 
although we won’t make use of that feature here. 0 

The generalization to multidimensional interpolation, as described above, is straightfor¬ 
ward. The arithmetic mean is trivially m~ ls ^j k M] t . The geometric mean is not quite so 
simple, since the matrices may not all be diagonal in the same basis, however the geometric 
mean can be defined as the unique solution, M, to Log (M k 1 M) = 0. Interpolation 
then corresponds to including weights, Ui, given by the areal coordinates of the point in the 
simplex, i.e. solving M = ^2 k UkMk or Ylk Uk Log (M^M) = 0 (see e.g. [19] for further 
discussion). We discuss several techniques for solving this second equation in Appendix C. 

4.2 Diagonal example 

Let us illustrate these ideas with some simple examples. In Gaussian, linear theory, if we work 
in &-space, the matrices are diagonal and all of these operations become trivial. A typical 
Cov in this limit is 

Cov [Pi, Pj} = €i (Pi ± Wi ) 2 5ij (4.1) 

where Pi is the ith bandpower, e* = 2/ N t is a small number when the number of modes 
per bin, A); = 14 urv ( 47 rA:?Afc), is large and Wi = 1/h is the amplitude of the white- or shot- 
noise component. In the simplest biasing scheme, and neglecting redshift-space distortions, 

Pi = b 2 P ma ,t(ki)- 

Let us consider the simplest 2D scheme where we vary just b and w and interpolate 
within the simplex ( b 2 = b\,w = wi), (b 2 = b 2 ,w = w\) and (b 2 = b\,w = u^). If we 
were to break equation 4.1 into its 3 parts then each could be interpolated simply, i.e. both 
parameters would be ‘fast’. We shall instead consider interpolating the square of the sum as 
a single block, treating both b and w as ‘slow’ parameters. In a modern survey we typically 
know b and w to order 10-20 per cent, so we will choose ±20 per cent as the range over which 
to vary b 2 and ±20 per cent as the range over which to vary w. For this situation, and for 
typical values of b and w, we find that both arithmetic and geometric interpolation recover 
the actual result to about a per cent. The deviation is largest furthest from the vertices of 
the simplex, and when b 2 P ~ w, as expected. An example is shown in figure 2. We have 
chosen to interpolate the parameters linearly in b 2 and w, though we could also have chosen to 
interpolate in log b 2 and log w or any mixture. The performance is not particularly dependent 
on this choice, but for the configuration shown linear-linear interpolation worked very well. 

Although this example is somewhat artificial (we did not split the covariance matrix into 
its component parts and we used diagonal matrices) two characteristics can be seen. First, 
we see some preference for the geometric interpolation over the arithmetic one. Second, both 
perform quite well over the range of parameters we may expect to explore. 

4.3 Matrix example 

This exercise was artificial in dealing with diagonal matrices. As a next step let us consider 
interpolation of non-sparse matrices. For this example we use the covariance matrix for the 
multipoles of the correlation function, computed in linear theory assuming Gaussian statistics 
and normalized by the survey volume, V. Motivated by the example of fitting baryon acoustic 
oscillations and redshift-space distortions we compute the covariance of £i(s) in linear theory 

5 In [11], the authors used a related property of SPD matrices to allow unconstrained interpolation over 
their parameter space without violating the SPD property. 
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Figure 2. An example of the deviation obtained using arithmetic or geometric interpolation within 
a 2D simplex (i.e. a triangle) for a single element of the precision matrix corresponding to equation 
4.1. The grey scale shows n es t/^true — 1- The lower left triangle shows the results for arithmetic 
interpolation, while the upper right triangle shows the results for geometric interpolation (with the 
upper right vertex being identified with the lower left). In both cases we interpolate the parameters 
linearly in b 2 and w. We scan the white noise level from below to above the cosmological power 
since the interpolations become exact in the limit that either term dominates. The dark band in 
the lower triangle indicates the worst-case scenario where the shot-noise and cosmological power are 
approximately equal. 


in 20 bins of s running from 25 h 1 Mpc to 125 h x Mpc for l = 0 and 1 = 2. This results in a 
40 x 40 covariance matrix with neighboring bins quite highly correlated. 

We interpolate the full matrix (i.e. not dividing it into shot-noise and sample variance 
terms or pre-whitening it in any way) in b 2 holding the shot-noise fixed and compare the 
interpolated value to the exact calculation. The result is shown in figure 3 where we see a clear 
preference for geometric interpolation over arithmetic interpolation but excellent performance 
from both. The metric used to compare the interpolated, n X n precision matrix to that 
evaluated explicitly is n~ l \\C Af2| |jr, where || ■ ||_f indicates the Frobenius norm (see Appendix 
B). This is essentially the rms (averaged over elements) deviation from the identity of £i e st 
times Ctrue- We see that interpolation induces sub-percent rms deviations even over such a 
broad range. For comparison, using a fixed covariance matrix across the range leads to an 
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b 2 


Figure 3. An example of interpolation in one dimension, in this case the large-scale bias, using 
arithmetic interpolation and geometric interpolation. The covariance matrix is 40 x 40, containing 
entries for £o and £2 evaluated at 20 points between s = 25/i _1 Mpc and 125/i -1 Mpc using linear 
theory with n = 3 x 10 -4 h 3 Mpc~ 3 . The interpolated n x n precision matrix, Q, is compared to the 
true one using the metric n _1 ||C' Af2||p. Note the sub-percent deviation even when interpolating over 
such a broad range in bias at fixed shot-noise. For comparison, using a fixed covariance matrix across 
the range leads to an order of magnitude larger deviation by the same metric. 


order of magnitude larger difference at the endpoints by the same metric. If we interpolate the 
correlation matrix (i.e. the covariance matrix normalized to unity along the diagonals) then 
the deviation is approximately halved, but of course we must then additionally interpolate 
the scaling of the diagonal entries. 

Next we consider interpolating in b 2 and h. In this case we interpolate over the range 
1.25 < b < 1.75 and 2.5 X 10~ 4 < n < 3.5 x 10~ 4 /i 3 Mpc~ 3 . The results are shown in figure 4 
with the same metric as in figure 3. In this situation the preference for geometric interpolation 
over arithmetic interpolation is less clear, though we do find that the geometric interpolation 
does tend to perform better with the chosen metric. Again the absolute performance of both 
the arithmetic and geometric schemes is very good. If we had used a fixed covariance matrix 
(e.g. the one at the lower left corner) the metric at the worst point is 10 times larger than 
the worst-case shown in figure 4. 

An alternative metric for evaluating how well we have estimated Q is to compute 
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Figure 4. An example of interpolation in two dimensions (the large-scale bias and the number 
density). The covariance matrix is 40 x 40, containing entries for £ 0 and £2 evaluated at 20 points 
between s = 25ft _1 Mpc and 125/) _1 Mpc computed using linear theory. The greyscale compares 
the interpolated n x n precision matrix, 12, to the true one using the metric n~ 1 \\C The 

lower right half gives the results for arithmetic interpolation while the upper right gives the results 
for geometric interpolation (with the upper right vertex identified with the lower left). See text for 
further discussion. 


tr[Ct me ^est] — n where C and 17 are n x n matrices. If the data are drawn from a Gaus¬ 
sian distribution, centered on the theory and with covariance Ct rue , and if y 2 is computed 
using 17 est for the precision matrix then the average value of y 2 is tr[Ct rue I7 es t] • For the case 
considered in figure 4, tr[Cp-ue^est] — n is never larger than about 0.03 n and usually consid¬ 
erably less. This can be compared to the \[2n change in y 2 required for ale change in the 
likelihood. On average, then, the error made by using the interpolated 17 in place of the ‘true’ 
17 in this case is negligible compared to the statistical uncertainties in the measurement and 
probably small compared to the errors in determining the Ck by Monte Carlo at each of the 
control points. We have investigated a number of other metrics for comparing matrices, and 
find a similar story by each measure. 
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4.4 More dimensions and simplices 

As a final example we consider interpolating the covariance matrix for the multipoles of the 
correlation function, as above, in a 4-parameter space and over multiple simplices. We allow 
b and n to vary, as above, but now additionally include variations in the linear growth rate, 
/, and a dilation parameter a. The large-scale bias and shot-noise affect the amplitudes 
of the two contributions to the covariance, the growth rate alters the the quadrupole to 
monopole ratio and the dilation parameter shifts the entire correlation function in scale. 
Thus this parameter set allows considerable freedom in the structure of the matrices we are 
interpolating over. 

We take the range of the parameters to be 1.25 < b < 1.75, 2.5 < n < 3.5 x 
10 -4 h 3 Mpc^ 3 , 0.74 < / < 0.76 and 0.98 < a < 1.02. To start, we evaluate the covariance 
matrices at points distributed throughout the volume (see below). Then we throw points at 
random within the space. For each point we find its enclosing simplex, compute the areal 
coordinates within that simplex and perform geometric interpolation from the 5 vertices of 
the simplex. Using the same metric as before, we find that the performance is excellent. 

We have investigated several distributions of points, finding that they all perform quite 
well. First we placed the points at the vertices of a hypercube bounding the parameter set 
(i.e. picking either the upper or lower limits, above, for each parameter), plus an additional 
point at the center for a total of 17 points. The rms deviation of Ct me fiest from the identity 
matrix for randomly thrown points was sub-percent. We also investigated using orthogonal 
arrays of points with 8 or 9 vertices (specifically OA(8,4,2,3) and OA(9,4,3,2) purely as il¬ 
lustrative examples). Within the region bounded by the points the rms deviation as again 
below a percent. Finally we investigated a 9 point configuration with a central point and 8 
points where each parameter was individually varied to the edges of its range while the others 
were held fixed at the central point. Again we found that the interpolation performed very 
well for the enclosed points. For every case we investigated with approximately 10 points, 
sub-percent rms deviations were found throughout the region enclosed by the points. 

Our conclusion is thus that low order interpolation from an unstructured mesh, while it 
may not be optimal, is very likely sufficient for upcoming surveys focused on BAO and RSD 
analyses of the power spectrum and correlation function. Using an interpolated precision 
matrix can reduce the error in the likelihood, compared to holding the precision matrix 
fixed, by a non-trivial factor and even a relatively coarse set of points provides an accurate 
interpolation. We have not considered the interesting problem of optimizing the position of 
the points in the mesh. Any such solution is likely to be problem specific, and to depend on 
the nature of the matrices being interpolated. Instead we have shown that the performance of 
our simple interpolation is not strongly dependent on the choice of mesh points. Finally, it is 
very conceivable that a more sophisticated interpolation approach could be devised and that it 
would save computational resources. The geometric interpolation method we have advocated 
has a simple generalization to techniques such a kriging, inverse distance weighting or kernel 
estimation. We leave such investigations to future work. 

5 Conclusions 

The study of large-scale structure has become a powerful means of learning about cosmology 
and fundamental physics. Analyses of large-scale structure data sets have become increasingly 
complex and sophisticated as the data themselves have grown in size and constraining power. 
In this paper we have focused on the comparison of theoretical models to the most common 
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summary statistics derived from such data: the redshift-space power spectrum and correla¬ 
tion function. Under the assumption of a Gaussian likelihood function the computation of 
statistical confidence levels or goodness-of-fit of models requires knowledge of the theory, the 
data and the covariance matrix. In principle all three can depend on the parameters being 
tested though the full dependence is not usually included. We have shown that a very simple 
interpolation scheme from an unstructured mesh allows for an efficient way to include this 
parameter dependence self-consistently in the analysis at modest computational expense. 

We have advocated multi-linear interpolation, under the assumption that the range of 
parameters being explored is relatively small and the changes smooth. For vector-valued 
quantities, such as the data vector of P(ki ) or ^(sj), the interpolation is straightforward. For 
matrix-valued quantities the best interpolation scheme is not as obvious. We have compared 
two, arithmetic interpolation and geometric interpolation, and found them to perform well 
though the geometric scheme (which uses the geometric structure of the group) can have up to 
half of the deviation of the simpler arithmetic scheme. As an illustrative example we showed 
that, for reasonable variations in shot-noise and large-scale bias, sub-percent deviations from 
interpolation could be achieved in predicting the linear theory covariance matrix at any point 
interior to the simplex. The interpolation introduces errors significantly smaller than the 
statistical uncertainty. 

This work assumes that the data and covariance matrix have been computed at the 
simplex points. We leave to future work an investigation of the optimal mesh placement 
and efficient ways to compute the covariance matrices by conditioning the sample covariance 
derived from Monte Carlo simulation. 
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A Neglecting the parameter dependence of the covariance 

For large data sets, with well constrained parameters, it is often possible to neglect the 
parameter dependence of the covariance matrix compared to that of the mean, or theoretical 
prediction. This is obviously the case if the covariance matrix is independent of the parameters 
(e.g. the shot-noise limit for galaxy surveys) but can be true more generally. In this situation 
performing a simple y 2 analysis is enough. In this appendix we quantify how neglecting the 
parameter dependence of the covariance matrix affects the likelihood in some simple cases, 
always assuming that the ‘fixed’ covariance matrix is chosen to be close to that of the best 
fit model. For a discussion of how an incorrect covariance matrix affects the likelihood, see 
e.g. [11, 16]. 

As in the main text, let us assume a Gaussian form for the likelihood of the theory, p, 
given data, d, with covariance C\ 

— 21n£ = (d — fi) T C~ 1 (d — n) + trlnC + • • • (A.l) 

then the parameter dependence can enter in the mean, /i(p), or variance, C(p). Obviously we 
cannot make a generic statement about which is the dominant dependence, since e.g. a priori 
we could have p or C be p-independent. For the case of galaxy surveys however we can make 
considerable progress (see [20] for very related discussion and [21] for an investigation for the 
case of BAO and [22] for an investigation for the case of cosmic shear). 
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Let us consider how the likelihood function falls from its peak as we vary the parameters 
away from the best-fit point. For analytic simplicity and in order to bring out the key points we 
shall work in linear theory and use a /c-space basis. Without loss of generality we can take our 
‘parameters’ to be bandpowers in a fc-bin, pi, with the measurements of the power spectrum in 
the bin being Pi = P(ki). With these assumptions p = p % while Cov[Pi, Pj] = e/pi + Wi) 2 5ij 
where e* = 2/TV* is a small number when the number of independent modes per bin, IVj, is 
large and we have written the white- or shot-noise component as re*. Our covariance matrix 
is most parameter dependent when Wi = 0, or we are in the ‘sample variance limit’. We shall 
therefore make this assumption. Note that the covariance is diagonal in fc-space in linear 
theory, so the log-likelihood becomes 


- 21n£ = 5>|C«| + + ... = ill + ... (A.2) 

t i 1 

where • • • represents terms independent of p % . Note how the first term (from the variation 
of the covariance matrix) is independent of e* while the second becomes increasingly more 
important as e* —> 0. For large amounts of data, small e*, the parameter dependence of 
the covariance term only becomes important once we are far from the peak of the likelihood 
function, i.e. for models which are disfavored by the data. To see this, note that for a model 
to be allowed p t must be close to Pi, i.e. within s/e. For such variations, we can compute 
the difference between allowing the covariance to vary and holding it fixed at the peak of 
the likelihood. This difference scales as s/e. The parameter dependence of the covariance 
therefore only becomes important for models disfavored by the data. 

An alternative approach [20] is to look at the Fisher matrix, i.e. the expectation value 
of the Hessian of the log-likelihood. For a Gaussian this becomes 

F a/3 oc tr [C^ 1 C ta C^ 1 C^\ + p, a C~ V,/3 (A.3) 

where , a denotes a derivative with respect to parameter p a and we have chosen Greek indices 
to label the parameters which can now be arbitrary. The Fisher matrix represents the curva¬ 
ture of the likelihood around the peak, assuming ‘typical’ data. Once again, note that the first 
term (arising from the parameter-dependence of the covariance) is independent of the overall 
normalization of C while the second term scales as the inverse of the normalization. It is 
easy to show that for bandpowers as parameters and the same assumptions as above the first 
term is 0('/2 i p^ 2 ) while the second is A) times larger and so dominates in the large-volume, 
high-precision limit. 


B Distance measures for SPD matrices 

In this appendix we quickly review some common distance measures on the space of symmet¬ 
ric, positive-definite (SPD) matrices. 

The most basic distance measure is the Frobenius norm (sometimes called the Hilbert- 
Schmidt norm). The Frobenius norm is the one induced by the inner product on matrices 
thought of as a vector space: (A, B) = tr(AB r ). If we work with symmetric matrices, 
11 A| | = y / tr(A 2 ). 
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We can also use the Kullback-Liebler divergence (or information gain). The KL diver¬ 
gence 6 between two zero-mean Gaussians with covariances C a and C b is 

D K L(C a \\C b ) = 1 [tr (C a C^) - logdet [C a C^) - N] (B.l) 

with C a and C b N x N matrices and D > 0 with equality only if C a = C b - A distance measure 
can be constructed from Drl by symmetrizing the arguments. If Aj are the eigenvalues of 
C~ 1 C b then 

i N / i \2 

D%%(C a ,C b ) = (B.2) 

and the square root of is a distance measure which is invariant under congruent trans¬ 

formations and inversion. 

Finally we can consider the space of positive-definite symmetric matrices as a manifold, 
with a Riemannian metric defined by the Frobenius dot product on the tangent space at any 
point. The geodesic distance between matrices C a and C b then becomes [e.g. 19] 


DR(C a , Cb) 


[|Log(C'- 1 C' b )[| jf , 


N 


N 


lo s 2 A * 


(B.3) 


where the Aj in the last equality are the eigenvalues of the matrix C~ 1 C b - Note that for any 
positive-definite symmetric matrix M we can decompose it into a diagonal and orthogonal 
matrix as M = UDU T with D containing the eigenvalues and U the eigenvectors of M. In 
this basis taking LogAf simply becomes taking the log of the diagonal entries, i.e. LogM = 
Uln(D)U T . 

Each of these different distance measures naturally lead to a different interpolation 
scheme. Consider for example taking the ‘average’. Under the Frobenius norm the arithmetic 
average, C, of the matrices C k minimizes the squared distance: \ C — Ck\ ■ The geodesic 

distance leads to geometric 7 means of matrices as the appropriate average, while the KL- 
based distance gives the geometric mean of the arithmetic and harmonic means [24]. Similar 
situations arise when performing interpolation, as described in the main text. 


C Solving the geometric interpolation equation 

Suppose we are given the values of the covariance matrix at the m = n p + 1 corners of a 
simplex, Ck, and the areal coordinates Uk corresponding to the (interior) point for which we 
want the interpolated value of the precision matrix, U. In order to interpolate geometrically 
we need to find the matrix Q which solves 

m 

c(n) = Y,u k Log(c k n) = o . (c.i) 

k =1 

®A divergence is a generalization of a metric which need not be symmetric or satisfy the triangle inequality. 
Formally a divergence on a space X is a non-negative function on the Cartesian product space, X x X, which 
is zero only on the diagonal. 

'It is easy to see why this is heuristically. For a Lie manifold the tangent vectors are isomorphic to the 
group generators. Since group generators and elements are related by exponentiation, the vectors are logs of 
matrices and lengths along parallel transported vectors are ‘logarithmic’ which converts an arithmetic mean 
into a geometric one. In fact the geodesic running from the origin to a group element G is simply exp[f LogG] 
for t € [0,1]. The geodesic running from P to Q is P 1 A(p _1 / 2 Qp _1 / i ) t p 1 / 2 . See [23] for a recent discussion 
of covariance matrices within the context of manifolds and a list of references. 
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One method would be to use Newton’s algorithm for matrix valued functions of matrices. 
This requires the (Frechet) derivative of G. Fast numerical algorithms for computing both 
the logarithm 8 and its derivative exist [25], but are complex. A more standard approach is to 
use a variant of Broyden’s method [26], or other quasi-Newton methods, which do not require 
the explicit evaluation of a derivative. By stacking the columns of our matrices atop each 
other we can turn n x n matrices into n 2 -dimensional vectors (this is the “vec” operation, 
equivalent to replacing pairs of indices, ij, by a super-index, I) and the problem reduces to 
a standard one of simultaneously solving multiple non-linear equations. Various efficient and 
general algorithms for this problem exist, however convergence can be difficult to achieve 
when the number of terms in the sum and the dimension of the matrices becomes large. 

The method we have used is a fixed point algorithm due to ref. [24], If we rotate our 
bases by 1 ^ 2 to set fli = 1 and left and right multiply by Cl 1 /' 2 and Q -1 / 2 we can write 
equation C.l as 

m 

LogfJ = — £ u' k Log (fi 1 / 2 ^ 1 / 2 ) (C.2) 

k =2 

where hats denote the matrices in rotated coordinates and the u' k are u k divided by 1 — 
u k • Ref. [24] shows that this can be solved for S = Logfl using a fixed point iteration 

m 

S e+1 = aS £ + {a - 1) u' k Log (exp{S e /2)C k exp(S £ /2)) (C.3) 

k =2 


starting at S° = u' k Logf^fc. This converges for a = (m + a)/(m + 0 + 1) for any integer 
a > 0 and we have found that convergence is typically very rapid. In fact for the situations 
we have explored the starting guess is already a good approximation to the geometric mean. 
When checking for convergence it is easier to use the symmetric form 

m 

fl 1 / 2 G(fl)L?- 1/2 = 0 = J2 WfcLog (C.4) 

k =1 

for which the arguments of the Log are explicitly SPD, which allows for quick computation. 
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